Selecting Local Models in Multiple Regression by Maximizing Power
نویسندگان
چکیده
This paper considers multiple regression procedures for analyzing the relationship between a response variable and a vector of d covariates in a nonparametric setting where both tuning parameters and the number of covariates need to be selected. We introduce an approach which handles the dilemma that with high dimensional data the sparsity of data in regions of the sample space makes estimation of nonparametric curves and surfaces virtually impossible. This is accomplished by abandoning the goal of trying to estimate true underlying curves and instead estimating measures of dependence that can determine important relationships between variables. These dependence measures are based on local parametric fits on subsets of the covariate space that vary in both dimension and size within each dimension. The subset which maximizes a signal to noise ratio is chosen, where the signal is a local estimate of a dependence parameter which depends on the subset dimension and size, and the noise is an estimate of the standard error (SE) of the estimated signal. This approach of choosing the window size to maximize a signal to noise ratio lifts the curse of dimensionality because for regions with sparsity of data the SE is very large. It corresponds to asymptotically maximizing the probability of correctly finding non-spurious relationships between covariates and a response or, more precisely, maximizing asymptotic power among a class of asymptotic level α t-tests indexed by subsets of the covariate space. Subsets that achieve this goal are called features. We investigate the properties of specific procedures based on the preceding ideas using asymptotic theory and Monte Carlo simulations and find that within a selected dimension, the volume of the optimally selected subset does not tend to zero as n → ∞ unless the volume of the subset of the covariate space where the response depends on the covariate vector tends to zero.
منابع مشابه
New Approach in Fitting Linear Regression Models with the Aim of Improving Accuracy and Power
The main contribution of this work lies in challenging the common practice of inferential statistics in the realm of simple linear regression for attaining a higher degree of accuracy when multiple observations are available, at least, at one level of the regressor variable. We derive sufficient conditions under which one can improve the accuracy of the interval estimations at quite affordable ...
متن کاملA MODIFICATION ON RIDGE ESTIMATION FOR FUZZY NONPARAMETRIC REGRESSION
This paper deals with ridge estimation of fuzzy nonparametric regression models using triangular fuzzy numbers. This estimation method is obtained by implementing ridge regression learning algorithm in the La- grangian dual space. The distance measure for fuzzy numbers that suggested by Diamond is used and the local linear smoothing technique with the cross- validation procedure for selecting t...
متن کاملNondestructive Determination of the Total Volatile Basic Nitrogen (TVB-N) Content Using hyperspectral Imaging in Japanese Threadfin Bream (Nemipterusjaponicus) Fillet
Background and Objectives: Considering the importance of safety evaluation of fish and seafood from capture to purchase, rapid and nondestructive methods are in urgent need for seafood industry. This study aimed to assess the application of hyperspectral imaging (HSI: 430-1010 nm) for prediction of total volatile basic nitrogen (TVB-N) in Japanese-threadfin bream (Nemipterusjaponicus) fillets, ...
متن کاملAn Overview of the New Feature Selection Methods in Finite Mixture of Regression Models
Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...
متن کاملBuilding Regression Cost Models for Multidatabase Systems
A major challenge for performing global query optimization in a multidatabase system (MDBS) is the lack of cost models for local database systems at the global level. In this paper we present a statistical procedure based on multiple regression analysis for building cost models for local database systems in an MDBS. Explanatory variables that can be included in a regression model are identiied ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006